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Abstract 

In the present work we carry out a study of the high energy cosmic rays mass identification capabilities of a hybrid 
detector employing both fluorescence telescopes and particle detectors at ground using simulated data. It involves the 
analysis of extensive showers with zenith angles above 60 degrees making use of the joint distribution of the depth of 
maximum and muon size at ground level as mass discriminating parameters. The correlation and sensitivity to the 
primary mass are investigated. Two different techniques - clustering algorithms and neural networks - are adopted to 
classify the mass identity on an event-by-event basis. Typical results for the achieved performance of identification 
are reported and discussed. The analysis can be extended in a very straightforward way to vertical showers or can be 
complemented with additional discriminating observables coming from different types of detectors. 
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1. Introduction 

Mass composition analysis is fundamental to un- 
derstand the features observed in the cosmic ray 
energy spectrum and to test theoretical models con- 
cerning the origin and the nature of the primary 
cosmic ray radiation at the highest energies. Cur- 
rent theories predict different spectra for each mass 
component so that the knowledge of the energy 
spectra for every mass component, or at least for 
groups of components, is required in order to dis- 
criminate among the proposed models. 
At low energies (E < 10 14 eV) cosmic ray com- 
position can be measured using direct detection 
techniques, such as spectrometers and calorimeters. 
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At higher energies, mass measurements are per- 
formed with indirect techniques, which make use of 
shower parameters sensitive to the primary mass. 
The depth at which the longitudinal development 
has its maximum, A max , and both the number of 
electrons N e and muons at ground level are most 
widely used. An extensive review of the adopted 
detection techniques and mass composition results 
over the entire energy spectrum range is presented 
in [1]. 

Two kinds of approaches are generally adopted in 
composition analysis. Likelihood methods, for in- 
stance in [2] or unfolding analyses, as in [3], aim to 
infer the average mass abundances or the energy 
spectra for different mass components on the ba- 
sis of a set of parameters sensitive to the primary 
mass, without attempting to establish the mass 
of each single event. The information of the mass 
identity on an event-by-event basis would however 
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be extremely useful and would lead to considerable 
progress. For instance when studying possible cor- 
relations of the shower arrival direction with given 
astrophysical objects it is desirable to select events 
according to their mass. Some worthwhile studies 
concern the flux of a given primary, e.g. protons 
for the p-Air cross section analysis or the search of 
gamma ray or neutrino events in the hadronic back- 
ground. In all these situations pattern recognition 
methods, e.g. neural networks as in [4, 5], linear 
discriminant analysis [6], are typically employed. 
Due to the absence of features strongly correlated to 
the mass and the presence of stochastic shower-to- 
shower fluctuations, both analyses must necessarily 
deal with a limited number of mass groups, typi- 
cally four or five in first kind of analysis and two or 
three in event-by-event studies. 
In this work we study the possibility to use neural 
networks and clustering algorithms as mass classi- 
fier tools on the basis of two parameters, the depth 
of shower maximum and the number of muons, 
measured with hybrid detectors, such as the Pierre 
Auger Observatory [7] or the Telescope Array [8]. 
In particular we restrict the analysis to events mea- 
sured with very high zenith angles (>60°) at energies 
above 10 18 eV. For such inclined showers the elec- 
tromagnetic component is strongly absorbed before 
reaching ground level due to the large atmospheric 
depth traversed and the muon component domi- 
nates the signal measured at the particle detectors. 
To a rough approximation only the muon reaches 
ground level. These events therefore represent a sim- 
ple way to measure a parameter reflecting the size of 
the muon component in experiments not equipped 
with dedicated muon detectors. This is achieved, 
as detailed in [9], by fitting the measured signals 
(after subtracting a small contribution due to the 
electromagnetic contamination) to a reference pa- 
rameterization of the muon density at ground level 
obtained from simulations. The depth of maximum 
can be determined from the longitudinal profile ex- 
tracted from the light measurements made with the 
fluorescence telescopes (see for instance [10]). 
The paper is organized as follows. Section 2 de- 
scribes the data set used for this study, built from 
Conex simulations of extensive air showers, and 
addresses the correlation between the two param- 
eters (A max and muon size) and the sensitivity to 
the primary mass. In Section 3 a brief description of 
the designed neural network and clustering meth- 
ods is reported. In Section 4 their application to 
samples of simulated data and the achieved classifi- 
cation results are presented. Our main conclusions 
are summarized in Section 5. 



2. Simulated data 

2.1. Simulation strategy 

The present study is based on a sample of sim- 
ulated showers generated with the Conex tool 
[11, 12] using QgsjetOI [13] as high energy hadronic 
interaction model. Conex is an hybrid code, based 
on the combination of a Monte Carlo strategy and 
the analytic solution of the cascade equations, thus 
allowing only a one-dimensional simulation of the 
shower. It requires considerably shorter CPU times 
with respect to the ones needed by typical three- 
dimensional codes, such as Corsika [14] or Aires 
[15]. This feature makes Conex particularly suit- 
able for applications involving data observed with 
fluorescence detector telescopes. 
Proton, nitrogen and iron primaries have been sim- 
ulated according to a flat energy spectrum from 
10 18 eV to 10 20 eV and a dN/ (d cos 6) oc cos 6 zenith 
angle spectrum from 60° to 80°. About 10 6 events 
are available per each primary nucleus. 
In order to perform a composition study, a set of 
parameters sensitive to the primary mass is needed. 
In this work we assumed the depth of shower max- 
imum A max and an estimator of the total number 
of muons iV M reaching ground level for inclined 
showers, denoted as Nig. While A max is directly 
available in Conex simulations, the latter has to be 
extracted from the simulated muon profile assum- 
ing a given observation depth. For our analysis we 
fixed such depth to 1450 m, roughly corresponding 
to the altitude of the Pierre Auger Observatory. 
Unfortunately Conex provides the muon profile 
only for muons above 1 GeV, therefore we need to 
employ full Aires simulations to derive an empir- 
ical correction factor to be applied to iV„. In Fig. 
1(a) we display the ratio between the number of 
muons above 1 GeV N> 1GeV and the total number 
of muons as a function of the shower zenith 
angle. Proton results are shown in red while iron in 
blue. As one can see the effect of the muon energy 
cut is not particularly relevant for inclined showers. 
The fraction of unaccounted muons is below 10% 
muons when using Conex. An empirical parame- 
terization as a function of the zenith angle is derived 
and shown in Fig. 1(a) with solid colored lines 1 . 
After this correction the simulated is divided by 
the number of muons of the reference parameteri- 
zation of the muon density at ground, obtained for 
this location at a conventional energy of 10 19 eV [9], 

1 The energy dependence of the correction factor was also 
investigated and a negligible dependence was found. 
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Fig. 1: Fig. 1(a): 7V M correction factor for proton (in red) and iron (in blue) Aires simulations. Fig. 1(b): Scatter plot of the 
parameters for proton (red), nitrogen (green) and iron (blue). The bottom panel refers to the case of detector 
resolution applied. 



to obtain the relative muon size with respect to the 
reference energy, N^ c . As the number of muons 
in the reference parameterization depends on the 
zenith angle, an estimator Nig C is obtained which 
is basically independent of the zenith angle. 
To take into account the reconstruction effects, the 
simulated A max MC and N^ c have been smeared 
with the detector resolution. In this work we assume 
the resolution reported by the Pierre Auger Obser- 
vatory in [9, 16]. Typical values for o~x mBX are ~27 
g/cm 2 at 10 18 eV improving at <20 g/cm 2 above 
10 19 eV. The detector resolution <tn 19 has been 
found of the order of 25% at 10 185 eV improving 
at ^12% above 10 19 eV. In Fig. 1(b) we present a 
scatter plot of A max and Nig at a fixed energy of 
10 19 eV for the three available primaries the up- 
per (lower) panel ignoring (accounting) for detector 
resolution as modelled. 



2.2. Correlation of the parameters and sensitivity 
to primary mass 

It is worth investigating the correlation of the 
two parameters and evaluating an estimator of the 
mass sensitivity for both. In Fig. 2(a) we report 
the Pearson correlation coefficient of both parame- 
ters for the three available nuclei, the bottom panel 
accounting for detector resolution. As can be seen 
the two parameters are almost independent for pro- 



ton primaries while for heavier nuclei the correlation 
amounts to 0.5 for nitrogen and 0.6 for iron. In the 
bottom panel the effect of the detector resolution is 
visible and the correlation observed for nuclei is sen- 
sibly reduced. 

To estimate which of the two parameters offers the 
best separation between the proton and iron dis- 
tribution we computed the Fisher coefficient J for 
X max and Ni 9 : 

_ (mf - mp 2 

J l - [ a Fe}2 + p]2 3 ~ Xmax ' Nl9 

where vrij and c 2 are the sample mean and vari- 
ance of the proton/iron distributions of the param- 
eter j. The computed quantity provides a measure 
of power to separate two populations of proton and 
iron: large values of J correspond to a good discrim- 
ination power. In Fig. 2(b) we report the Fisher co- 
efficient as a function of energy. As can be seen the 
number of muons estimator has a better discrimi- 
nation power than A max . However, the reconstruc- 
tion of the A max parameter, especially at low ener- 
gies, is much better than that of N 19 . This results in 
the increased sensitivity of A max when applying the 
resolution to the data in the bottom panel. At the 
highest energies both parameters are almost equally 
powerful. 
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(a) X max — Nig correlation (b) X max — Nig Fisher factor 



Fig. 2: Fig. 2(a): Correlation coefficient of X ma x-iVi9 as a function of the primary energy for proton (red), nitrogen (green) 
and iron (blue). Fig. 2(b): Fisher coefficient of the Jf max and 7Vig parameters with respect to proton and iron primaries. 
Bottom panels take into consideration detector resolution effects. 



3. Analysis method 

Two different classification methods are used in 
this work, one based on the neural network technique 
and the other on clustering algorithms. In the follow- 
ing sections we briefly describe the two approaches 
and the strategies adopted to perform the analysis. 

3.1. The Clustering approach 

A simple clustering algorithm, the well-known k- 
Means [17], is employed for the classification task. 
It starts assuming an initial set of K cluster cen- 
troids which are then iteratively moved to determine 
the partition of the data that minimizes a specified 
square error function E 2 . We assumed the Euclidean 
distance as measure and the following squared error 
function: 

K n k N 

£ 2 = EEE(^--^) 2 ( 2 ) 

k=l j=l i=l 

where X ij IS the ith data vector (j=X max ,A^ig) and 
Cik are the coordinates of the K cluster centroids. 
The data sample were first preprocessed to have zero 
mean and unit variance and then divided into three 
smaller subsets. One of these subsets is used for the 
cluster training phase in which the K clusters deter- 
mined are labelled on the basis of the Monte Carlo 
information of the primary mass. The validation sub- 



sets are mainly used to choose the optimal number 
of clusters, while the last subset if finally used to 
test the partition and report the performance of the 
method. As such algorithm is known to depend on 
the initial choice of the centroids, the algorithm was 
run over the three subsets assuming each time 100 
different initial partitions and retaining the best one 
in terms of the total error. A number of clusters K > 
20 has been found to provide optimal results for our 
classification problem. 

3.2. The Neural Network approach 

A feed forward neural network (NN) is structured 
in parallel layers of neurons, connected to neurons 
in adjacent layers through weighted links. The in- 
put layer is connected to the input data vector and a 
predefined number of hidden layers process the sig- 
nal towards the output layer which returns the final 
response of the network to the presented input data. 
Each neuron i linearly transforms the data Xj from 
neurons in the previous layer according to the fol- 
lowing expression: 

m 

Zi = '^WijXj + bi (3) 

j=i 

where Wij are the weights associated to the j link 
and bi is an additive bias. The neuron output is then 
obtained by applying a transfer function f(zi) to the 
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Fig. 3: 3(a): Clustering output projected in the X ma x-A?ig plane. Proton data are shown with red dots while iron data with 
blue dots. The solid black lines represent the obtained clusters (K=20). Data have been standardized in the plot; 3(b): 
Distribution of neural network outputs in presence of two components, proton (red histogram) and iron (blue histogram), 
for the test data sample. A sample energy bin corresponding to log 10 E=[19.0-19.2] is chosen. 



neuron input z%. 

During the training phase the network weights and 
biases are iteratively adjusted to minimize an error 
function E MSE , given by the squared differences be- 
tween the desired output yi and the computed net- 
work output U: 

1 N 

£mse = -5>,(x,w)-^] 2 (4) 

i=l 

To improve the network generalization capabilities 
a regularization term is often added to the above 
error function to minimize the squared sum of the 
network weights: 

£S = 7£mse + (1-7)E^ ( 5 ) 

i=l 

where N\y are the number of weights of the network 
and 7 is a parameter responsible to guarantee a com- 
promise between the network error minimization 
and the generalization performances of the network. 
After testing several network architectures and 
choices of transfer functions, we obtained good re- 
sults using a simple network design with one hidden 
layer (3 neurons), and an output layer with one neu- 
ron. The activation functions are hyperbolic tangent 
in the hidden layer and linear in the output layer. 
The network input data have been normalized to 
zero mean and unit variance and then divided into 
three subsets, train-validation-test samples. The 
first is used to train the network. The cross valida- 
tion sample is used to stop the training at a given 



epoch to avoid over fitting the data sample used for 
learning. The last sample is finally used to test the 
trained network identification capabilities. 
Several back propagation training algorithms have 
been tested (steepest descent, conjugated gradient 
and quasi-Newton algorithms). We achieved bet- 
ter identification performances with quasi-Newton 
methods using the Broyden-Fletcher-Goldfarb- 
Shanno (BFGS) error minimization formula [18, 19]. 

4. Results 

The classification results obtained both with the 
clustering and the neural network methods are re- 
ported in Figs. 3(a) and 3(b) in a sample energy 
range log 10 E= [19-19.2]. 

In Fig. 3(a) we report the output of the clustering 
procedure for the test data sample projected in the 
X max -Nig plane. Proton data are shown with red 
dots, while iron data with blue dots. A number of 
K=20 clusters, represented by the solid black lines, 
is considered for this case. The achieved classifica- 
tion performances in terms of efficiency and purity 
are larger than 80% and are reported in Table 1. 
In Fig. 3(b) we report the distribution of the net- 
work outputs in presence of the proton (red lines) 
and iron (blue lines) test data. The performance in 
terms of classification matrix and purity relative to 
such case are reported in Table 2. Assuming a cut 
in the network output equal to 0.5, the classification 
efficiencies and purity are found to be around 90%. 
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Fig. 4: Classification efficiency (left panel) and purity (right panel) for proton and iron events shown respectively with red dots 
and blue squares. Results relative to the clustering method are shown with empty markers while those obtained with 
the neural network with filled markers. 





Classification 


Purity 




p Fe 




p 


0.84 0.10 


0.91 


Fe 


0.16 0.90 


0.88 



Table 1: Classification matrix and purity for the clustering 
method for the sample test data in the energy bin 
log 10 E= [19.0,19.2]. 



In Fig. 4 we report the classification efficiency for 
proton and iron (right panel) and the purity (left 
panel) as a function of the primary energy. Results 
obtained with the neural network (clustering) clas- 
sifier are shown with filled (empty) dots. Both meth- 
ods are able to accurately discriminate the two pre- 
sented classes over the entire data set. The neural 
network method performs slightly better than the 
clustering approach. As expected, the classification 
performances are found to slightly increase with en- 
ergy due to the improved reconstruction resolution 
for both X max and 7V lg parameters. 





Clas 


sification 


Purity 




V 


Fe 




p 


0.87 


0.09 


0.91 


Fe 


0.13 


0.91 


0.88 



Table 2: Classification matrix and purity for the neural net- 
work method for the sample test data in the energy 
bin log 10 E= [19.0,19.2]. 

To have the smallest contamination for a given class 
and match a given analysis requirement one can 
tighten the applied selection cuts, increasing at the 
same time the event rejection rate, as shown in Figs. 
5(a) and 5(c). 



In Fig. 5(a) wc report the classification efficiency 
(filled dots) and purity (empty dots) as a function 
of the network output for proton and iron data. The 
best compromise between the highest efficiency and 
purity is found around 0.55-0.60. To decrease the 
contamination from other species below few percent 
a cut of ^0.2 for proton and ^0.8 for iron must be 
assumed. 

Similarly, for the clustering method a smaller con- 
tamination can be achieved by rejecting those clus- 
ters having a classification purity below a given re- 
quirement during the training phase. We therefore 
report in Fig. 5(b) the results as a function of the 
cluster purity obtained for the training data set. The 
steps observed in the plots are due to the fact that 
for a given purity threshold an entire set of events 
belonging to a cluster below threshold are rejected. 
The effect is more pronounced when a small number 
of clusters is assumed in the analysis. 
The presented results so far assumed that no other 
species other than proton and iron is present in the 
flux. If an intermediate component, like nitrogen, is 
included in the data and both classifier are trained 
to recognize three species, the number of classifier 
parameters needed to achieve optimal results must 
be increased. The obtained performances deteriorate 
significantly in a 3-component situation. 

For the extreme components the identification ef- 
ficiency and purity drops to a level of 75%, while for 
the intermediate component it is of order 60%. It is 
clear that with current parameter sensitivity to the 
mass and reconstruction resolution it is not possible 
to efficiently discriminate more than two masses. We 
therefore quantify in Figs. 5(b), 5(d) the increased 



(i 



CO 

u 

'55 
w 
cc 

O 




proton 

-efficiency 
•purity 



3 
Q. 

§ 0.8 
C 

H 0.6 
0> 

§ 0.4 

TO 
O 

j£ 0.2 
(O 
(O 

.to 



0.2 


0.4 0.6 0.8 1 

NNcut 


(a) NN 


- p+Fe data 




proton 

-efficiency 
-purity 


iron 

• efficiency 
a purity 


~i , , , i , 


I t t i T i i t I t i i 1 



0.2 



0.4 



0.6 0.8 1 

Cluster purity 



to 
o 

'C5 
</> 

TO 

o 



0.2- proton iron 

- -efficiency -efficiency 

~ « purity o purity 

0-, 



3 
CL 

§ 0.8 
C 

<r> 

ig 0.6 

§ 0.4 

TO 
O 
*t 

Si 
</> 

TO 



0.2 


0.4 0.6 0.8 1 




NNcut 


(b) NN - 


p+N+Fe data 










_ CDOQOQQOeOGQOOQOOt 


.^1 \ 


proton 


iron \ 


-efficiency 


-efficiency L 


- -purity 


i purity \„ 


~i , , , i , 


i , , , i , , , i , , , i 



0.2 



0.4 



0.6 0.8 1 

Cluster purity 



(c) Clustering - p+Fe data 



(d) Clustering - p+N+Fe data 



Fig. 5: Top panels: Classification efficiency (filled markers) and purity (empty markers) as a function of the cut applied in the 
neural network output. Proton and iron results are respectively shown with red dots and blue squares. Fig. 5(a) refers 
to a 2-components flux made by proton and iron, while in Fig. 5(b) nitrogen is added to the data sample; Bottom 
panels: Classification efficiency (filled markers) and purity (empty markers) as a function of the cluster purity for the 
clustering classifier method. 



contamination in the proton and iron classified sam- 
ples due to the presence of nitrogen. For the neural 
network an optimal compromise is found assuming 
a cut equal to 0.2 and 0.8. The corresponding effi- 
ciency and purity are reduced at the 80% level. How- 
ever one can obtain smaller contamination by further 
tightening the applied cuts at the cost of reducing 
the detection efficiency. For the clustering method 
the presence of an intermediate component rises the 
contamination in the proton and iron sample to val- 
ues around 40-50%. To recover the contamination 
observed with proton and iron components only a 
large number of clusters must be rejected. 



5. Summary 

In the present work we propose two alternative 
strategies to the problem of the high energy cosmic 
ray mass identification on a event-by-event basis. 
One is based on the neural network technique while 
the other on clustering algorithms. The perfor- 
mances of both classifiers have been studied with 
simulated showers generated with the Conex tool. 
The mass discrimination is based on two parame- 
ters, the depth of shower maximum X ma _ x and an 
estimator of the muon number A^g reconstructed 
in very inclined showers. Realistic reconstruction 
resolutions have been assumed for both parameters. 
The analysis is in particular optimized for events 
recorded by hybrid detectors with zenith angles 



above 60°. Nevertheless it can be extended in a 
straightforward way to nearly vertical events, pro- 
vided that an alternative muon number parameter 
estimator is introduced. 

We have found very good identification perfor- 
mances for both methods in the case of a 2- 
component flux of proton and iron nuclei. The 
expected misclassification are found of the order of 
~10-15%, decreasing with energy, with slightly bet- 
ter performances exhibited by the neural network 
classifier. 

A significant loss of efficiency is observed when 
training the classifiers to recognize a flux with an 
intermediate mass component is added. In this sit- 
uation the expected contamination affecting the 
proton and iron identification has been reported. 
No significant performance degradation has been 
observed with an alternative data sample generated 
using a different hadronic interaction model, such 
as Epos. 

The present method, as it is, can be applied also to 
other typical discrimination problems in cosmic ray 
physics, such as hadron-gamma or hadron-neutrino 
event discrimination. In such cases the classification 
performances are expected to increase given the 
better separation between the two classes. Other 
feasible applications include the selection of a given 
mass group from the background, as it could be 
done for the selection of pure samples of protons for 
correlation analysis with astrophysical objects or to 
derive estimates of the proton- Air cross section. 
In this work we made use of supervised methods. An 
implicit assumption is made, namely that the mea- 
sured data must be well described or at least brack- 
eted by the hadronic model predictions adopted 
in the classifiers. Surprisingly, this is the case for 
most of the shower observables so far measured at 
such extreme energies, i.e. see [1]. Furthermore, the 
differences among different models are going to be 
significantly reduced with incoming model releases 
accounting for recent LHC measurements, for in- 
stance see [20]. Recently a significant discrepancy 
between data and Monte Carlo simulations has 
been reported in the muon number estimator both 
for vertical [21] and for very inclined showers [9]. 
For this reason we are currently developing a semi- 
supervised classifier, partially using the strong con- 
straints offered by the available models. Results and 
performance comparison with supervised methods 
will be reported in a forthcoming paper. 
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